Building a Microblog Corpus for Search Result Diversification

نویسندگان

Ke Tao

Claudia Hauff

Geert-Jan Houben

چکیده

Queries that users pose to search engines are often ambiguous either because different users express different query intents with the same query terms or because the query is underspecified and it is unclear which aspect of a particular query the user is interested in. In the Web search setting, search result diversification, whose goal is the creation of a search result ranking covering a range of query intents or aspects of a single topic respectively, has been shown in recent years to be an effective strategy to satisfy search engine users. We hypothesize that such a strategy will also be beneficial for search on microblogging platforms. Currently, progress in this direction is limited due to the lack of a microblog-based diversification corpus. In this paper we address this shortcoming and present our work on creating such a corpus. We are able to show that this corpus fulfils a number of diversification criteria as described in the literature. Initial search and retrieval experiments evaluating the benefits of de-duplication in the diversification setting are also reported.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Microblog Retrieval from Exterior Corpus by Automatically Constructing Microblogging Corpus

A large-scale training corpus consisting of microblogs belonging to a desired category is important for highaccuracy microblog retrieval. Obtaining such a large-scale microblgging corpus manually is very time and laborconsuming. Therefore, some models for the automatic retrieval of microblogs from an exterior corpus have been proposed. However, these approaches may fail in considering microblog...

متن کامل

Improving Microblog Retrieval from Exterior Corpus by Automatically Constructing a Microblogging Corpus

متن کامل

Language Differences and Metadata Features on Twitter

In the past several years, microblogging services like Twitter and Facebook have become a popular method of communication, allowing users to disseminate and gather information to and from hundreds or thousands (or even millions) of people, often in real-time. As much of the content on microblogging services is publicly accessible, we have recently seen many secondary services being built atop t...

متن کامل

RMIT at TREC 2011 Microblog Track

This paper describes our submission to the TREC 2011 microblog task. For the experiments, we use our new self-index search engine, NeWT, to support ranked search in the Twitter document corpus. We use a combination of phrase queries and degrading conjunctive Boolean intersection to improve retrieval effectiveness. Keywords-self-index; full-text search, phrases, threshold; intersection

متن کامل

Time-Aware Latent Concept Expansion for Microblog Search

Incorporating the temporal property of words into query expansion methods based on relevance feedback has been shown to have a significant positive effect on microblog search. In contrast to such word-based query expansion methods, we propose a concept-based query expansion method based on a temporal relevance model that uses the temporal variation of concepts (e.g., terms and phrases) on micro...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Building a Microblog Corpus for Search Result Diversification

نویسندگان

چکیده

منابع مشابه

Improving Microblog Retrieval from Exterior Corpus by Automatically Constructing Microblogging Corpus

Improving Microblog Retrieval from Exterior Corpus by Automatically Constructing a Microblogging Corpus

Language Differences and Metadata Features on Twitter

RMIT at TREC 2011 Microblog Track

Time-Aware Latent Concept Expansion for Microblog Search

عنوان ژورنال:

اشتراک گذاری